An improved Similarity Measure For Chinese Text Clustering

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Similarity Measure For Text Classification And Clustering

Computing the similarity between documents is an important operation in the text processing. In this paper, a new similarity measure is proposed. To calculate the similarity between two documents with respect to a feature, the proposed measure takes the following three cases in to account I) The same feature appears in both documents, II) The same feature appears in only one document, and III) ...

متن کامل

Striving for an Improved Audio Similarity Measure

In this submission to MIREX’07, we implement various modifications to the Algorithm G1C by Elias Pampalk which ranked first in last year’s MIREX AudioSim task. Although each of the modifications showed only minor effects in our experiments, their combination constantly outperformed the original algorithm in our automated tests. Therefore, we consider it worth submitting the resulting algorithm ...

متن کامل

Text Clustering Using a Suffix Tree Similarity Measure

In text mining area, popular methods use the bagof-words models, which represent a document as a vector. These methods ignored the word sequence information, and the good clustering result limited to some special domains. This paper proposes a new similarity measure based on suffix tree model of text documents. It analyzes the word sequence information, and then computes the similarity between ...

متن کامل

An Improved Algorithm for Text Document Clustering

Due to the advancement of internet, the volume of the electronic documents available on the web is increasing day by day. Document clustering plays important role in organization and summarization of these documents. Thus, developing a fast and effective document clustering algorithm is of great importance. This paper presents an improved algorithm for document clustering. This algorithm is an ...

متن کامل

An improved semantic similarity measure for document clustering based on topic maps

A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assigns a real number between 0 and 1 to a pair of documents, depending upon the degree of similarity between them. A value of zero means that the documents are completely dissimilar whereas a value of one indicates that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: DEStech Transactions on Engineering and Technology Research

سال: 2016

ISSN: 2475-885X

DOI: 10.12783/dtetr/icmite20162016/4588